93 research outputs found

    Atomic broadcast:a fault-tolerant token based algorithm and performance evaluations

    Get PDF
    Within only a couple of generations, the so-called digital revolution has taken the world by storm: today, almost all human beings interact, directly or indirectly, at some point in their life, with a computer system. Computers are present on our desks, computer systems control the antilock braking system and the stability control in cars, they collect usage statistics in elevators in order to anticipate maintenance and repair operations. Computer systems also operate critical systems, such as nuclear power plants, airplane control systems or space rockets. Furthermore, computer systems are not only omnipresent, but also increasingly networked. As the use of computer systems has increased dramatically over the past decades, the needs and expectations associated with these systems have also increased. In particular, one of the critical points of a system is its availability (the fraction of the time during which the system provides a service to the users): the costs and negative publicity of a system outage (be it a commercial web site or a stock exchange for example) are often considerable. Fault tolerance is one of the approaches to designing a highly-available system: a fault tolerant system is designed in such a way that the failure of one of the components of the system does not compromise the functionality of the system as a whole. Replication is one of the common fault tolerance techniques. Instead of having a single machine (a replica) providing a service, the system is composed of several replicas running the service and connected through a network. If one of the replicas fails, the service is still provided by the remaining replicas. The replication technique is interesting as it can be achieved by using software running on commodity hardware, thus avoiding the high cost of special purpose hardware. Replication, although intuitive to understand, is complex to implement in practice, as the replicas have to interact in order to ensure the consistency of the system as a whole. Group communication simplifies the replication problem, by hiding issues such as the communication between the replicas, the crashes of one or several replicas and the synchronization of the replicas. In this thesis, we start by comparing two replication techniques – group communication and quorum systems – and identifying in which case either technique should be used. Atomic broadcast (a group communication primitive at the heart of this work) allows replicas to broadcast messages to each other and then deliver them in the same total order, even if replicas broadcast messages quasi simultaneously. Atomic broadcast is especially useful for replication: since all replicas deliver messages in the same order, their state is kept consistent. After the comparison between the replication techniques, we present an atomic broadcast algorithm designed to perform well when the system is heavily loaded and that allows to quickly detect crashed replicas (by minimizing the consequences of wrongly suspecting a non-crashed replica). The presentation of the algorithm includes simulation results comparing the performance of the new algorithm to previously proposed atomic broadcast algorithms. The second part of the thesis focuses on the experimental performance evaluation of the new algorithm in several settings. We start by comparing four atomic broadcast algorithms in a local area network. We then compare three of the four algorithms in a wide area network, with sites in Switzerland, Japan and France, and where the round trip time between the sites varies between 4 and 300 ms. Finally, we evaluate the impact of the size of the system (the number if replicas) on the performance of the algorithms

    On the Cost of Modularity in Atomic Broadcast

    Get PDF
    Modularity is a desirable property of complex software systems, since it simplifies code reuse, verification, maintenance, etc. However, the use of loosely coupled modules introduces a performance overhead. This overhead is often considered negligible, but this is not always the case. This paper aims at casting some light on the cost, in terms of performance, that is incurred when designing a relevant group communication protocol with modularity in mind: atomic broadcast. We conduct our experiments using two versions of atomic broadcast: a modular version and a monolithic one. We then measure the performance of both implementations under different system loads. Our results show that the overhead introduced by modularity is strongly related to the level of stress to which the system is subjected, and in the worst cases, reaches approximately 50%

    Architectural Issues of JMS Compliant Group Communication

    Get PDF
    Group communication provides one-to-many communication primitives that simplify the development of highly available services. Despite advances in research and numerous prototypes, group communication stays confined to small niches. To facilitate the acceptance of group communication by a larger community, a new specification and API, called JMSGroups, based on the popular Java Message Service (JMS) has previously been presented. As a follow-up, this paper focuses on the architectural issues of the JMSGroups implementation. We consider an implementation based on a JMS server, i.e., a JMS server that is modified internally to provide a group communication service. Usually JMS server is implemented as a single entity providing its service to numerous clients. However, single server architecture is exposed to failures and is not suitable for group communication. To address this problem, we discuss the issues related to the JMS server replication (first without providing group communication). Different replicated architecture options are presented and compared. Finally, we show how to construct a fault-tolerant JMSGroups system, by extending the replicated JMS server with a group communication service

    Comparing Atomic Broadcast Algorithms in High Latency Networks

    Get PDF
    Since the introduction of the concept of failure detectors, several consensus and atomic broadcast algorithms based on these detectors have been published. The performance of these algorithms is often affected by a trade-off between the number of communication steps and the number of messages needed to reach a decision. Some algorithms reach decisions in few communication steps but require more messages to do so. Others save messages at the expense of an additional communication step to diffuse the decision to all processes in the system. This trade-off is heavily influenced by the network latency and the message processing times. Performance evaluations of these algorithms, both in simulated or in real environments, have been published. These evaluations often consider a symmetrical setup : all processes are on the same network and have identical peer-to-peer latencies. In this paper, we evaluate the performance of three consensus and atomic broadcast algorithms using failure detectors in several wide area networks. We specifically focus on the case of a system with three processes, two of which are on a local area network and the third on a distant site and examine how this setting affects the performance of all three algorithms

    Revisiting Token-based Atomic Broadcast Algorithms

    Get PDF
    Many atomic broadcast algorithms have been published in the last twenty years. The two main mechanisms used to tolerate failures (if we exclude synchronous systems and consider only crash failures) are unreliable failure detectors and group membership. Token-based atomic broadcast algorithms represent a large class of atomic broadcast algorithms. Interestingly all the token-based algorithms rely on group membership. The paper presents a token-based atomic broadcast algorithm that uses a failure detector, namely the new failure detector denoted by R. The failure detector R is compared with P and S. Solving consensus with token-based algorithms using R is also discussed

    Modeling and validating the performance of atomic broadcast algorithms in high-latency networks

    Get PDF
    The performance of consensus and atomic broadcast algorithms using failure detectors is often affected by a trade-off between the number of communication steps and the number of messages needed to reach a decision. In this paper, we model the performance of three consensus and atomic broadcast algorithms using failure detectors in the oft-neglected setting of wide area networks and validate this model by experimentally evaluating the algorithms in several different setups

    Robust TCP Connections for Fault Tolerant Computing

    Get PDF
    When processes on two different machines communicate, they most often do so using the TCP protocol. While TCP is appropriate for a wide range of applications, it has shortcomings in other application areas. One of these areas is fault tolerant distributed computing. For some of those applications, TCP does not address link failures adequately: TCP breaks the connection if connectivity is lost for some duration (typically minutes). This is sometimes undesirable. The paper proposes robust TCP connections, a solution to the problem of broken TCP connections. The paper presents a session layer protocol on top of TCP that ensures reconnection, and provides exactly-once delivery for all transmitted data. A prototype has been implemented as a Java library. The prototype has less than 10% overhead on TCP sockets with respect to the most important performance figures

    Molecular mechanisms underlying the control of antigenic variation in African trypanosomes

    Get PDF
    African trypanosomes escape the host adaptive immune response by switching their dense protective coat of Variant Surface Glycoprotein (VSG). Each cell expresses only one VSG gene at a time from a telomeric expression site (ES). The [`]pre-genomic' era saw the identification of the range of pathways involving VSG recombination in the context of mono-telomeric VSG transcription. A prominent feature of the early post-genomic era is the description of the molecular machineries involved in these processes. We describe the factors and sequences recently linked to mutually exclusive transcription and VSG recombination, and how these act in the control of the key virulence mechanism of antigenic variatio

    The JmjC domain protein Epe1 prevents unregulated assembly and disassembly of heterochromatin

    Get PDF
    Heterochromatin normally has prescribed chromosomal positions and must not encroach on adjacent regions. We demonstrate that the fission yeast protein Epe1 stabilises silent chromatin, preventing the oscillation of heterochromatin domains. Epe1 loss leads to two contrasting phenotypes: alleviation of silencing within heterochromatin and expansion of silent chromatin into neighbouring euchromatin. Thus, we propose that Epe1 regulates heterochromatin assembly and disassembly, thereby affecting heterochromatin integrity, centromere function and chromosome segregation fidelity. Epe1 regulates the extent of heterochromatin domains at the level of chromatin, not via the RNAi pathway. Analysis of an ectopically silenced site suggests that heterochromatin oscillation occurs in the absence of heterochromatin boundaries. Epe1 requires predicted iron- and 2-oxyglutarate (2-OG)-binding residues for in vivo function, indicating that it is probably a 2-OG/Fe(II)-dependent dioxygenase. We suggest that, rather than being a histone demethylase, Epe1 may be a protein hydroxylase that affects the stability of a heterochromatin protein, or protein–protein interaction, to regulate the extent of heterochromatin domains. Thus, Epe1 ensures that heterochromatin is restricted to the domains to which it is targeted by RNAi
    • …
    corecore